Entry Name:  "SJU-Yeon-MC2"

VAST Challenge 2015
Mini-Challenge 2

 

 

Team Members:

Hanbyul Yeon, Sejong University, hbyeon109@sju.ac.kr, PRIMARY

Seokyeon Kim, Sejong University, ksy0586@sju.ac.kr

Mingyu Pi, Sejong University, pmg9405@sju.ac.kr

Sangbong Yoo, Sejong University, usangbong@sju.ac.kr

Yun Jang, Sejong University, jangy@sejong.edu



Student Team:  Yes 

 

Did you use data from both mini-challenges? No

 

Analytic Tools Used:

Gephi, http://gephi.github.io/

Tableau, http://www.tableau.com/

Visual Analytics framework developed in Sejong University.

 

Approximately how many hours were spent working on this submission in total?

2 weeks

 

May we post your submission in the Visual Analytics Benchmark Repository after VAST Challenge 2015 is complete? Yes 

 

Video Download

Video:

 http://vis.sejong.ac.kr/sju-yeon-mc2-video.wmv

 

-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

 

 

Questions

 

MC2.1 – Identify those IDs that stand out for their large volumes of communication.  For each of these IDs

 

      a.      Characterize the communication patterns you see.

      b.      Based on these patterns, what do you hypothesize about these IDs?

 

Limit your response to no more than 4 images and 300 words.

 

 

The IDs that stand out for the large volumes of commutation are ID 1278894 and ID 839736. Figure 1-1 presents the number of communication counts by senders in (a) and receivers in (b). As seen in the figure, we can identify that those two IDs produce a large volumes of communication by looking at the number of communication counts encoded in the length of bars. In order to characterize the communication patterns, we plot the number of communication counts over time in Figure 1-2 and 1-3. Figure 1-2 shows the communication patterns between ID 1278894 and other visitors, whereas, Figure 1-3 presents those between ID 839736 and other visitors. As shown in Figure 1-2, ID 1278894 sent something regularly every 5 minutes for an hour and received something from visitors between send-time stamps. On the other hand, ID 839736 sent and received something simultaneously and sending patterns followed the receiving patterns more or less as ID 839736 and other visitors were in conversations. Based on these patterns, we hypothesize that ID 1278894 was an administrator who was in charge of events by sending and receiving event-related information, and ID 839736 was in charge of hotlines, such as Q&A, waiting time, directly communicating with other visitors.

Figure 1-1 (a) Communication counts by senders. (b) Communication counts by receivers.


 

 

Figure 1-2 Communication pattern between ID 1278894 and visitors.

 

 

Figure 1-3 Communication pattern between ID 839736 and visitors,

 

 

 

 

 

 

 

 

MC2.2  Describe up to 10 communications patterns in the data. Characterize who is communicating, with whom, when and where. If you have more than 10 patterns to report,
please prioritize those patterns that are most likely to relate to the crime.

 

Limit your response to no more than 10 images and 1000 words.

 

 

In order to differentiate the communication patterns, we use modularity that is a measure of the structure of networks. The modularity gives hints to divide networks into several modules, such as groups, clusters, or communities. For a given division of the networks¡¯ vertices into modules, the modularity reflects the concentration of edges within modules compared with random distribution of links between all nodes regardless of modules. We computed the modularity using Gephi which is an open graph viz platform. Figure 2-1 presents a visualization of the communication patterns according to the modularity. As seen in the figure many modules are differentiated by node locations, colors, layouts. Since we assume that the vandalism was discovered on Sunday, we show the entire communication network on Sunday. We further investigate this network visualization according to patterns. Note that we found eight different patterns from the communication network.

Figure 2-1 Overview of communication network.

 

 

Figure 2-2 (a) Pattern 1: Communication pattern with the park service (ID 839736). (b) Pattern 2: Communication pattern with the park service (ID 1278894) and the external node.

 

Pattern 1 is shown in Figure 2-2 (a). This pattern represents all communication networks connected to the park service, ID 839736. We marked the park service ID in the figure. Since majority of the communication is related to the park service, the communication pattern is located in the center of the entire network. Most of nodes in this network are connected to the park service.

Pattern 2 is presented in Figure 2-2 (b). This pattern is the communications with the park service, ID 1278894, and the external node. Since there are many communications to ID 1278894 and the external node, the network is placed in the center of the entire network visualization. The main difference between Pattern 1 and Pattern 2 is that Pattern 1 is more irregular compared to Pattern 2 over time on Sunday. The counts of the communications with ID 839736 varied dynamically over time on Sunday due to the vandalism, whereas, the counts of the communications with ID 1278894 and the external varied slightly, which was not a major change.

 

Figure 2-3 (a) Pattern 3: Communications occurred between people in the pattern group. (b) Pattern 4: Different communications between the group and parker service or the external.

 

Pattern 3 is classified as shown in Figure 2-3 (a) and the communications occurred between people in the pattern group. Since there is no specific regular pattern in this group, the nodes are located more or less randomly.

Pattern 4 represents three different communications including between the group and the park service or the external, and between the group and a certain person, and between people in the group. This is a regular pattern as seen in Figure 2-3 (b)

 

 

Figure 2-4 (a) Pattern 5: Communication networks only with park service and between people in the group. (b) Pattern 6: Small communication networks with 4 people.

 

 

Figure 2-5 (a) Pattern 7: Small communication networks with 3 people. (b) Pattern 8: Small communication networks with 2 people.

 

Pattern 5 is similar to Pattern 4 but there are communications only between the group and the park service, and between people in the group. This is shown in Figure 2-4 (a).

Pattern 6, 7, 8 are small communication networks and the groups contain the small number of people. The group of 4, 3, 2 people are presented in Figure 2-4 (b), 2-5 (a), 2-5 (b), respectively.

In order to analyze the communication patterns over time, we plot the counts of the communications separated by the locations.

 

 

 

Figure 2-6 Communication patterns of Pattern 2.

 

Figure 2-6 shows the all records included in Pattern 2. Any abnormal pattern is not visible in this figure. Although we do not show the graph for Pattern 1 here, the graph is similar to Figure 1-3.

 

 

 

Figure 2-7 Communication patterns of Pattern 3.

Figure 2-7 presents the communication patterns of Pattern 3. Although the graph patterns do not seem to contain any abnormal patterns, there are spikes whenever Scott Jones¡¯ showcase in Wet Land and Coaster Alley. This might indicate that many fan of his are grouped in this Pattern 3. Also this group was visible every day (Friday, Saturday, Sunday).

 

 

Figure 2-8 Communication pattern of Pattern 4.

Figure 2-8 represents the communication patterns of Pattern 4. As seen in the figure, the group in Pattern 4 was only visible on specific day, which was Sunday in this figure. We guess that this group includes normal park visitors and they were reacting a lot in the communication right after the vandalism was discovered in Wet Land.

 

 

 

Figure 2-9 Communication pattern of Pattern 7 Example 1.

As we mentioned earlier, Pattern 6, 7, 8 are sort of similar but different numbers of people in the group. We extracted one group from the Pattern 7 and plotted the communication patterns in Figure 2-9. People in this group seemed to stay all day long on Saturday and Sunday and we do not find any abnormal communication pattern.

 

 

 

Figure 2-10 Communication pattern of Pattern 7 Example 2.

Similar to Figure 2-9, we picked another group from Pattern 7 as presented in Figure 2-10. We suspect that people in this group were related to the vandalism. First they came directly to Web Land in the morning on Saturday and then they left. Again they came back on Sunday and stayed in Wet Land when the vandalism was discovered. It seems like they prepared and practiced the crime on Saturday, and then they executed it on Sunday.

 

 

 

 

 

 

 

MC2.3 – From this data, can you hypothesize when the crime was discovered?  Describe your rationale.

 

Limit your response to no more than 3 images and 300 words. 

 

 

We hypothesize that the crime was discovered between 11:30am and 11:45am on Sunday. In order to derive our hypothesis, we first computed entropies of all people within the communications to extract irregularities in the communication patterns. Then we counted the number of people whose entropies were greater than 0.5 and plotted in Figure 3-1 (a). In this figure, we can find that there is severe irregularity between 11am and 12pm on Sunday. Assuming the vandalism was discovered around that time, we further investigated the data to guess where the vandalism was found by using X-Y plots for entropy vs. the number of communications in Figure 3-1 (b-e). Note that we differentiated the location information by color. There is no outstanding pattern in (b-d) but many people in Wet Land produced lots of communications as seen in (e). Now we need to confirm our hypothesis using different datasets. In terms of the time when the vandalism was discovered, we searched for abnormal communication patterns of ID 879736 who was indicated as a Q&A hotline. We compared the number of communications by all visitors and the number of communication by ID 879736 in Figure 3-2. The patterns for the visitors are more or less similar every day but the pattern for ID 879736 is very different around noon on Sunday. This implies that lots of answering was needed right after discovering the vandalism. We also plotted the number of communications over time according to the locations excluding the park service (ID 839736 and ID 1278894) in Figure 3-3. As seen in the figure, we can guess that Scott Jones¡¯ showcase show happened at 11AM and 4PM every day in Coaster Alley and there was the vandalism discovered between 11:33am and 11:41am on Sunday in Wet Land.

 

Figure 3-1 (a) Number of people whose entropies were greater than 0.5. (b)- (e) X-Y plots for entropy vs. the number of communications.

 

 

Figure 3-2 Communication counts. One (blue) includes entire communications and the other (dark grey) is only between park service (ID 839736) and visitors.

 

 

Figure 3-3 Communication trends by locations excluding the park service (ID 839736 and 1278894).